Search CORE

2 research outputs found

n-gram Frequency Ranking with additional sources of information in a multiple-Gaussian classifier for Language Identification

Author: Córdoba Herralde Ricardo de
D'haro Enríquez Luis Fernando
Lucas Cuesta Juan Manuel
Zugasti Raposo Javier
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2008
Field of study

We present new results of our n-gram frequency ranking used for language identification. We use a Parallel phone recognizer (as in PPRLM), but instead of the language model, we create a ranking with the most frequent n-grams. Then we compute the distance between the input sentence ranking and each language ranking, based on the difference in relative positions for each n-gram. The objective of this ranking is to model reliably a longer span than PPRLM. This approach overcomes PPRLM (15% relative improvement) due to the inclusion of 4-gram and 5-gram in the classifier. We will also see that the combination of this technique with other sources of information (feature vectors in our classifier) is also advantageous over PPRLM, showing also a detailed analysis of the relevance of these sources and a simple feature selection technique to cope with long feature vectors. The test database has been significantly increased using cross-fold validation, so comparisons are now more reliable

Archivo Digital UPM

Approach to Understanding the Botnet Phenomenon ” from Moheeb Abu Rajab,

Author: Andreas Terzis
Betreuer Gregor Maier
Computer Science
Fabian Monrose
Javier Zugasti Raposo
Jay Zarfoss
Publication venue
Publication date
Field of study

The following text is a summary of the original paper: “A Multifacete

CiteSeerX